A Cross-Layer Runtime Framework for Checkpoint-based Soft-Error and Aging Management in SoCs

نویسندگان

  • Venkata Yaswanth Raparti
  • Sudeep Pasricha
چکیده

Transient faults due to single and multiple bit-flips and permanent aging effects due to Bias Temperature Instability (BTI) and Hot Carrier Injection (HCI) gradually reduce chip reliability over time. Unfortunately, the increasingly stringent on-chip dark-silicon power constraints prohibit costly fault resilience solutions. Clearly, a viable approach is needed that can address both transientand aginginduced faults in emerging multicore chips. In this paper, we propose a novel runtime framework (CHARM) to manage the useful chip lifetime, while also addressing transient faults and meeting darksilicon and application performance constraints. Experimental results on a 60-core chip multiprocessor show that CHARM achieves an improvement of up to 2.5× in lifetime, up to 5× in resiliency to softerrors, and up to 6× in number of applications executed over the chip lifetime compared to a state-of-the-art solution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FUSED: A Low-cost Online Soft-Error Detector

The growth in soft error rates caused by shrinking device geometries and transistor variability can undermine system reliability, requiring cross-layer resilience solutions. In this paper, we make following contributions to this area. First, we introduce a new framework called FUSED in which softerror detectors are automatically compiled from and inserted into application code through the Rose ...

متن کامل

Reliable Software for Unreliable Hardware - A Cross-Layer Approach

xiv 2) The Instruction Error Masking Index estimates the probability that an error at an instruction will ultimately be masked until the final program output, i.e. does not become visible at the application output and therefore is denoted as ‘masked’. 3) In case the error is not masked, the Error Propagation Index estimates how many outputs will be affected by the unmasked error. These instruct...

متن کامل

Human Error Assessment in City Gate Stations of Isfahan Natural Gas Company Using the System for Predictive Error Analysis and Reduction Framework

Introduction: Human factor analysis has been identified as the most common cause of accidents in natural gas transportation and distribution facilities. The occurrence of accidents at these systems, especially gas reduction stations located in residential and industrial areas, has had catastrophic consequences. Therefore, this study aimed at analyzing critical tasks and human error assessment u...

متن کامل

A Virtual Layer for FPGA Based Parallel Systems (MP-SoCs)

Besides performance and time to market, robustness and reliability are important design targets for modern Systemson-Chip (SoCs). Despite these features the power consumption must be as low as possible. To meet these design goals parallel, flexible, and adaptive architectures are required [1]. Today, dynamically reconfigurable FPGAs are well suited to form a parallel architecture because they i...

متن کامل

CPM: A Cross-Layer Framework to Efficiently Support Distributed Resources Management

Resource management, and especially power management, is a key aspect for the success of modern battery supplied multimedia devices. This class of devices are usually based on SoCs with a wide range of heterogeneous subsystems, competing for shared resources while offering several power control mechanisms. Many of these mechanisms require suitable software support to be exploited effectively. U...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016